conv layer
Appendix A Control algorithm The action-value function can be decomposed into two components as: Q (PT) (s, a) = Q (P) (s, a) + Q (T) w
We use induction to prove this statement. The penultimate step follows from the induction hypothesis completing the proof. Then, the fixed point of Eq.(5) is the value function of in f M . We focus on permanent value function in the next two theorems. The permanent value function is updated using Eq.